171 research outputs found
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Providing systems the ability to relate linguistic and visual content is one
of the hallmarks of computer vision. Tasks such as text-based image retrieval
and image captioning were designed to test this ability but come with
evaluation measures that have a high variance or are difficult to interpret. We
study an alternative task for systems that match text and images: given a text
query, the system is asked to select the image that best matches the query from
a pair of semantically similar images. The system's accuracy on this Binary
Image SelectiON (BISON) task is interpretable, eliminates the reliability
problems of retrieval evaluations, and focuses on the system's ability to
understand fine-grained visual structure. We gather a BISON dataset that
complements the COCO dataset and use it to evaluate modern text-based image
retrieval and image captioning systems. Our results provide novel insights into
the performance of these systems. The COCO-BISON dataset and corresponding
evaluation code are publicly available from \url{http://hexianghu.com/bison/}
Enhancing Domain Word Embedding via Latent Semantic Imputation
We present a novel method named Latent Semantic Imputation (LSI) to transfer
external knowledge into semantic space for enhancing word embedding. The method
integrates graph theory to extract the latent manifold structure of the
entities in the affinity space and leverages non-negative least squares with
standard simplex constraints and power iteration method to derive spectral
embeddings. It provides an effective and efficient approach to combining entity
representations defined in different Euclidean spaces. Specifically, our
approach generates and imputes reliable embedding vectors for low-frequency
words in the semantic space and benefits downstream language tasks that depend
on word embedding. We conduct comprehensive experiments on a carefully designed
classification problem and language modeling and demonstrate the superiority of
the enhanced embedding via LSI over several well-known benchmark embeddings. We
also confirm the consistency of the results under different parameter settings
of our method.Comment: ACM SIGKDD 201
CondenseNet: An Efficient DenseNet using Learned Group Convolutions
Deep neural networks are increasingly used on mobile devices, where
computational resources are limited. In this paper we develop CondenseNet, a
novel network architecture with unprecedented efficiency. It combines dense
connectivity with a novel module called learned group convolution. The dense
connectivity facilitates feature re-use in the network, whereas learned group
convolutions remove connections between layers for which this feature re-use is
superfluous. At test time, our model can be implemented using standard group
convolutions, allowing for efficient computation in practice. Our experiments
show that CondenseNets are far more efficient than state-of-the-art compact
convolutional networks such as MobileNets and ShuffleNets
Deep Neuroevolution of Recurrent and Discrete World Models
Neural architectures inspired by our own human cognitive system, such as the
recently introduced world models, have been shown to outperform traditional
deep reinforcement learning (RL) methods in a variety of different domains.
Instead of the relatively simple architectures employed in most RL experiments,
world models rely on multiple different neural components that are responsible
for visual information processing, memory, and decision-making. However, so far
the components of these models have to be trained separately and through a
variety of specialized training methods. This paper demonstrates the surprising
finding that models with the same precise parts can be instead efficiently
trained end-to-end through a genetic algorithm (GA), reaching a comparable
performance to the original world model by solving a challenging car racing
task. An analysis of the evolved visual and memory system indicates that they
include a similar effective representation to the system trained through
gradient descent. Additionally, in contrast to gradient descent methods that
struggle with discrete variables, GAs also work directly with such
representations, opening up opportunities for classical planning in latent
space. This paper adds additional evidence on the effectiveness of deep
neuroevolution for tasks that require the intricate orchestration of multiple
components in complex heterogeneous architectures
- …